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Abstract 

The analysis of cross-correlations is extensively applied for the understanding of interconnections in stock markets and the 
portfolio risk estimation. Current studies of correlations in Chinese market mainly focus on the static correlations between 
return series, and this calls for an urgent need to investigate their dynamic correlations. Our study aims to reveal the 
dynamic evolution of cross-correlations in the Chinese stock market, and offer an exact interpretation for the evolution 
behavior. The correlation matrices constructed from the return series of 367 A-share stocks traded on the Shanghai Stock 
Exchange from January 4, 1999 to December 30, 201 1 are calculated over a moving window with a size of 400 days. The 
evolutions of the statistical properties of the correlation coefficients, eigenvalues, and eigenvectors of the correlation 
matrices are carefully analyzed. We find that the stock correlations are significantly increased in the periods of two market 
crashes in 2001 and 2008, during which only five eigenvalues significantly deviate from the random correlation matrix, and 
the systemic risk is higher in these volatile periods than calm periods. By investigating the significant contributors of the 
deviating eigenvectors in different time periods, we observe a dynamic evolution behavior in business sectors such as IT, 
electronics, and real estate, which lead the rise (drop) before (after) the crashes. Our results provide new perspectives for the 
understanding of the dynamic evolution of cross-correlations in the Chines stock markets, and the result of risk estimation is 
valuable for the application of risk management. 
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Introduction 

The stock market is a typical complex system with interactions 
between individuals, groups, and institutions at different levels. In 
financial crises, the risk can quickly propagate among these 
interconnected institutions which have mutual beneficial business. 
Therefore, the analysis of the correlations between shares issued by 
different institutions is of crucial importance for the understanding 
of interactive mechanism of the stock market and the portfolio risk 
estimation [1-3]. Variety of works have been done to reveal the 
information contained in the internal correlations among stocks, 
and the methods generally used in the research of stock cross- 
correlations include the random matrix theory (RMT) [4,5], the 
principal component analysis (PC A) [6-8], and the hierarchical 
structure [9-17]. 

The random matrix theory (RMT), originally developed in 
complex quantum system, is applied to analyze the cross- 
correlations between stocks in the U.S. stock market by Plerou et 
al. [4]. The statistics of the most of the eigenvalues of the 
correlation matrix calculated from stock return series agree with 
the predictions of random matrix theory, but with deviations for a 
few of the largest eigenvalues. Extended work has been conducted 
to explain information contained in the deviating eigenvalues [18], 
which reveals that the largest eigenvalue corresponds to a market- 
wide influence to all stocks and the remaining deviating 



eigenvalues correspond to conventionally identified business 
sectors. Additional work has proved that even the eigenvalues 
within the spectrum of RMT carry some sort of correlations 
[19,20]. Using the same RMT method, extensive works have been 
performed in the correlation analysis of various stock markets [21— 
30]. 

In recent years, there are increasing works concentrated on the 
variation of the cross-correlations between market equities over 
time [31-40]. Aste et al. have investigated the evolution of the 
correlation structure among 395 stocks quoted on the U.S. equity 
market from 1996 to 2009, in which the connected links among 
stocks are built by a topologically constrained graph approach 
[34]. They find that the stocks have increased correlations in the 
period of larger market instabilities. By using the similar filtered 
graph approach, the correlation structure among 57 different 
market indices all over the world has been studied [37]. Fenn et al. 
have used the RMT method to analyze the time evolutions of the 
correlations between the market equity indices of 28 geographical 
regions from 1999 to 2010 [38], and they also observe the increase 
of the correlations between several different markets after the 
credit crisis of 2007-2008. Similar results have also been observed 
inRefs. [31,32,35,41,42]. 

The RMT method has been applied to the analysis of the static 
correlations between the return series in the Chinese stock market 
[26]. No clear interactions between stocks in same business sectors 
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Figure 1. Dynamics of correlation coefficient distribution. Probability density function (PDF) P(c,y) of the correlation coefficients calculated 
from the return series of 367 A-Share stocks evolved with the time t. 
doi:1 0.1 371 /journal.pone.009771 1 .g001 
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Figure 2. Dynamics of mean correlation coefficient. Mean correlation coefficient <c,y> and mean absolute return (\Ga -share I) of the A-share 
Index of Shanghai Stock Exchange evolved with the time t. The black solid line in the middle of the shadow shows <c,y> calculated from the return 
series of 367 A-Share stocks within a moving window of length 400 days, and solid lines at the top and bottom of the shadow are <r,y> + ff(Q/) and 
<r,y> — o(Cjj), where o"(c, ; ) is the standard deviation of the correlation coefficients. The red dashed line shows (\GA-share\') °f tne A-share Index of 
Shanghai Stock Exchange calculated from the daily records of the A-share Index within a moving window of length 100 days, scaled by a factor 10. 
doi:1 0.1 371 /journal.pone.009771 1 .g002 
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Figure 3. Dynamics of eigenvalue distribution. Probability density function (PDF) P(X) of the eigenvalues obtained from the correlation matrix 
of the return series of 367 A-Share stocks evolved with the time t. 
doi:1 0.1 371 /journal.pone.009771 1 .g003 



are observed, while unusual sectors containing the ST (specially 
treated) and Blue-chip stocks are identified by a few of the largest 
eigenvalues. Further work has been done to analyze the anti- 
correlated sub-sectors that compose the unusual sectors [43]. Up 



to now, not much work has been conducted on the dynamics of 
stock correlations in the Chinese market to the best of our 
knowledge. Using the daily records of 259 stocks on the Chinese 
stock market from 1997 to 2007, the dynamic evolution of the 




200 



220 



2012 
2011 
2010 
2009 
2008 
2007 
2006 
2005 
2004 
2003 
2002 
2001 



X 



Figure 4. Enlarged eigenvalue distribution. Partial enlarged drawing of eigenvalue distribution for /. >20. 
doi:1 0.1 371 /journal.pone.009771 1.g004 
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Figure 5. Comparison between the eigenvalues of empirical correlation matrix and random correlation matrix. Probability density 
function (PDF) P(X) of the eigenvalues of the correlation matrix constructed from the return series of 367 A-Share stocks in the first moving window 
form 04/01/1999 to 06/09/2000. The dotted line is the analytical result of RMT obtained from equation (3). The inset shows the largest eigenvalue X\ 
of the empirical return series, which is much larger than the upper bound l„ mx of RMT. 
doi:1 0.1 371 /joumal.pone.009771 1 .g005 
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Figure 6. Comparison between the eigenvalues of empirical correlation matrix and surrogate random correlation matrix. The black 
circled line is the 99th percentile of the eigenvalues of the shuffled return series. The return series in each moving window is randomized by shuffling 
for 10 times. The red squared line is the number of the empirical eigenvalues significantly larger than those of the shuffled data, which are the 
eigenvalues larger than the 99th percentile of the eigenvalues of the shuffled data. The blue solid line shows the A-share Index of Shanghai Stock 
Exchange, scaled by a factor 0.005. 
doi:1 0.1 371 /journal.pone.009771 1 .g006 
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Figure 7. Dynamics of Cumulative Risk Fraction (CRF). CRF is measured by the eigenvalues obtained from the correlation matrix of the return 
series of 367 A-Share stocks within a moving window of length 400 days. Different lines correspond to the proportions of total variance explained by 
PC 1, PC 1-10, PC 1-50, and PC 1-367. PC 1 denotes the principal component corresponding to the largest eigenvalue k\. 
doi:1 0.1 371 /journal.pone.009771 1 .g007 



Chinese stock network was firstly analyzed in [36] . In their work 
the links are constructed between the stocks which have 
correlations larger than a threshold, and a stable topological 
structure is revealed by using a dynamic threshold instead of the 
static threshold. Although additional efforts are made to identify 
the economic sectors based on the RMT method, the dynamic 
effects of conventional business sectors is extremely weak. 

The principal component analysis (PCA) is another method 
commonly used to detect the correlations between stock returns. It 
is closely related to the RMT method, since it is also done through 
eigenvalue decomposition of the correlation (or covariance) matrix 
of the return series. This method uses an orthogonal transforma- 
tion to convert a set of possible correlated returns into several 
uncorrelated components, which are ranked by their explanatory 
power for the total variance of the system. The studies of 
correlations among stock returns based on the PCA method are 
primarily concerned about the systemic risk measures [6-8]. 

In this paper, by mainly using the RMT method, we study 
dynamic evolution of the correlations between the 367 A-share 
stocks traded on Shanghai Stock Exchange from 1999 to 2011. 
The internal correlations between the stocks are investigated based 
on the correlation matrix of the return series of individual stocks in 
a moving window with a fixed length. We are mainly concerned 
about the statistical properties of the correlation coefficients, 
eigenvalues and eigenvectors of the correlation matrix, and their 
variations in different time periods. Our results confirm the strong 
collective behavior of the stock returns in the periods of market 
crashes, which is verified by the observations of the distribution of 
the correlation coefficients and the mean correlation coefficient. 
Further, based on the PCA method we calculate the proportion of 
total variance explain by the first n components, through which 
the systemic risk of the Chinese stock market is estimated for 



different time periods. Another important purpose of our study is 
to extract the information contained in the eigenvectors deviating 
from RMT. We find the largest eigenvector quantifies a market- 
wide influence on all stocks, and this market mode remains stable 
over time. For the interpretation of other deviating eigenvectors, 
dynamic evolutions of several conventional industries including 
IT, electronics, machinery, petrochemicals, and real estate, are 
remarkably observed. 

Materials and Methods 

The database analyzed in our study contains the daily data of all 
A-Share stocks traded on Shanghai Stock Exchange (SHSE), one 
of the two stock exchanges in mainland China. The A-Share stocks 
are issued by mainland Chinese companies, and traded in Chinese 
Yuan. The data source is from Beijing Gildata RESSET Data 
Technology Co., Ltd, see http://www.resset.cn/. To better 
understand the correlation structures under different market 
conditions, we select the A-share stocks traded on Shanghai Stock 
Exchange from January 4, 1999 to December 30, 2011 covering 
the two big crashes in 2001 and 2008. To make sure that the stocks 
have enough number of trading days to be statistically significant 
in our studies, we select the stocks traded on the stock exchange for 
at least 2600 days, i.e., exclude those stocks suspended from the 
market for more than about two years. This filter yields the sample 
data including 367 A-Share stocks and 1114364 daily records in 
total. 

Before we quantify the cross-correlations between stocks, we 
first calculate the return series for a given stock i as 

G,(t)=\np i {t)-\np i (t-\), (1) 
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Table 1. Basic information about the A-M conventional industries. 



Industry code 


Industry 


Number of stocks 


A 


Agriculture 


4 


B 


Mining 


7 


CO 


Food & beverage 


21 


CI 


Textiles & apparel 


12 


C2 


Timber & furnishings 


0 


C3 


Paper & printing 


5 


C4 


Petrochemicals 


31 


C5 


Electronics 


10 


C6 


Metals & non-metals 


26 


C7 


Machinery 


52 


C8 


Pharmaceuticals 


19 


C99 


Other manufacturing 


2 


D 


Utilities 


19 


E 


Construction 


6 


F 


Transportation 


14 


G 


IT 


18 


H 


Wholesale & retail trade 


54 


1 


Finance & insurance 


2 


J 


Real estate 


43 


K 


Social services 


11 


L 


Communication & cultural industry 


4 


M 


Comprehensive 


7 



The conventional industries are grouped based on the China Securities Regulatory Commission (CSRC) industry code. The basic information includes the industry code, 
full name of the industry, and the number of chosen stocks belonging to each industry. 
doi:1 0.1 371 /journal.pone.009771 1 .t001 



where Pt(i) is the price for stock i at time t, and t is in units of one 
day. The Pearson's correlation coefficient between two stock 
return series Gi(t) and Gj(t) is defined as 



starting date of the moving window covers the period from 04/0 1 / 
1999 to 12/05/2010, and the ending date is from 06/09/2000 to 
30/12/2011. 



<[Gi(t) - <G,(?)>] [G,{t) - <G/(0>] > 



O t Oj 



(2) 



where c,- and Gj are the standard deviations of two stock return 
series. It is a common measure of the dependence between the 
return series of the two stocks. There are TV = 367 sample stocks, 
therefore we have a correlation matrix C with 367 x 367 
correlation coefficients as elements. The elements of the correla- 
tion matrix are restricted to the domain — 1 < Cy < 1 : for 0 < Cy < 1 
the stocks are correlated, for — 1 < c,y < 0 the stocks are anti- 
correlated, and for Cg = 0 the stocks are uncorrelated. 

The cross-correlation defined above is to calculate the 
dependence between the return series in the whole period of the 
sample data. We are more interested in the dynamic variation of 
the stock correlations evolved with time t, so we look at the 
correlations calculated over a moving window. The size T of the 
moving window is fixed to be 400 trading days, i.e., about two 
years, which is a little bit larger than the number of the sample 
stocks. Equation (2) is applied to calculate the correlation 
coefficients over a subset of return series within the moving 
window [t— T+l,t]. For instance, the correlations in the first 
moving window are computed by the return series within [1,7*], 
and [2,7*+ 1] for the following moving window. In consideration 
of our sample date, which is from 04/01/1999 to 30/ 12/201 1, the 



Results 

Dynamics of correlation coefficients 

We first analyze the distribution of the elements Cy of the 
correlation matrix to capture the statistical properties of the 
correlation coefficients. In Figure 1, the probability density 
function (PDF) P(cy) of the correlation coefficients evolved with 
time t is shown. We observe that the center of the distribution 
clearly deviates from zero for the whole range of t. The values of 
the coefficient Cy, at which the peaks of P(cy) are located, are 
significandy positive and vary over t. The peaks of P(cy) show two 
local maxima of Cm as t approaches 2003 and 2009, and appear at 
relatively small Cy for other t. 

The Chinese stock market suffered a big crash after the release 
of the policy of state-held shares sale in listed companies in 2001, 
and the collapse of the internet bubble also took place in 2000- 
2001. In 2008, the global financial crisis burst out, and hit the 
stock markets around the world, certainly including the Chinese 
stock market. Considering that the length of the moving window is 
about two years, the correlations between the stock returns are 
significandy increased in the time windows 2001-2003 and 2008- 
2009. This indicates that stock price variations are more likely to 
be correlated around the market crashes. 

To further verify the dependence of the stock correlations on the 
time t, we compute the mean correlation coefficient <c,y> in the 
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Figure 8. Contribution Xf of conventional industries. Xf for u ] obtained from the correlation matrix of the return series of 367 A-Share stocks 

evolved with time t. 

doi:1 0.1 371 /journal.pone.009771 1 .g008 



moving window. Figure 2 plots <(c,y) as a function of the evolving 
time /, and it strongly fluctuates during the whole range of t. We 
simply look at the curve of (fiij), and pick out two local maxima 
on 02/04/2003 and 04/09/2009 and a local minimum on 25/12/ 
2006 with eyes. There are also some relevant works concerned 
with extremum values for trend detection in stock price dynamics 
[44-46] . The moving windows corresponding to the two maxima 
are from 30/07/2001 to 02/04/2003 and from 17/01/2008 to 
04/09/2009, and for the minimum is from 10/05/2005 to 25/12/ 
2006. The date 30/07/2001 was close to the date 26/07/2001 on 
which the policy of state-held shares sale was formally implement- 
ed, and 17/01/2008 was near the date 21/01/2008 on which the 
Shanghai Stock Exchange Index dropped more than 5% followed 
by a decline over 7% the next day. 

The volatility of the A-share Index of Shanghai Stock 
Exchange, quantified as the mean absolute returns within the 
moving window of 100 days length, is also illustrated in Figure 2. 
In the periods from 30/07/2001 to 02/04/2003 and from 17/01/ 
2008 to 04/09/2009 the stock market was strongly fluctuating, 
while in the period from 10/05/2005 to 25/12/2006 the market 
was in a relatively calm state. In comparison with the variation of 
<c,y), one may conclude that stock correlations are more 
prominent in volatile periods, showing larger values of <(c,y) than 
those in calm periods. 



(PDF) of the eigenvalues and its variation over time t. In Figure 3, 
the PDF P(X) for X<20 evolved with t is plotted. The peaks of 
P(X) show larger values for t around 2003 and 2009 than those for 
other /. The P(X) for large eigenvalues X > 20 is plotted in Figure 4. 
The largest eigenvalue evolves with time t, and shows larger 
values, i.e., X\ >200, in the time windows 2001-2003 and 2008- 
2009. This phenomenon consists with the unveiling of two local 
maxima of <t'//> in the moving windows from 30/07/2001 to 
02/04/2003 and from 17/01/2008 to 04/09/2009. 

In the observation of P{X), we note that there exist large 
eigenvalues obviously large than the eigenvalues of the random 
correlation matrix. To compare the difference between the 
eigenvalues of the stock correlation matrix and those of the 
random correlation matrix, we show the analytical result of the 
random matrices following Ref. [47]. For the correlation matrix of 
N random time series of length L, the PDF P(X) of the eigenvalues 
X in the limit N— >oo and L— >co is given by 



P(X) = 



Q \/ (Xmax — X)(X — Xmin) 



2n 



X 



(3) 



where Q = L/ N >\, and / is within the bounds X m j„ < X < X max . 
k m in and X max are the minimum and maximum eigenvalues of the 
random correlation matrix, which are given by 



Dynamics of eigenvalues and their explanations of 
system variance 

We compute the eigenvalues of the correlation matrix C with 
NxN elements, and denote them as Xk, k=l,---,N s and 
Xl>Xx> ■ • • > Xff. We investigate the probability density function 




(4) 
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Figure 9. (a). Contribution Xf of conventional industries. Xf for u 2 obtained from the correlation matrix of the return series of 367 A-Share stocks 
evolved with time t. (b). Contribution Xf of conventional industries. Xf for i? obtained from the correlation matrix of the return series of 367 A-Share 
stocks evolved with time t. (c). Contribution Xf of conventional industries. Xf for ir obtained from the correlation matrix of the return series of 367 
A-Share stocks evolved with time t, (d). Contribution Xf of conventional industries. Xf for w 5 obtained from the correlation matrix of the return series 
of 367 A-Share stocks evolved with time t. 
doi:1 0.1 371 /journal.pone.009771 1 .g009 



In Figure 5, we plot P(X) of the random correlation matrix with 
finite Z, = 400 and TV = 367, the same as those of the stock return 
series. Within the bounds [X m i„,X max ], P(X) of the correlation 
matrix constructed from the empirical return series in the first 
moving window (black solid line) is consistent with the analytical 
result of equation (3) (red dashed line). There also exist some 
deviations of large eigenvalues. In particular, the largest eigenvalue 
X\ « 120 shown in the inset of Figure 5, which is about 31 times 
larger than X max = 3.83. 

We next identify the eigenvalues of the stock correlation matrix 
which deviate from those of the random correlation matrix, and 
investigate their variations over time t. The analytical result of 
RMT is strictly valid for N — >oo and L— >oo. Instead, we compare 
X of the stock correlation matrix with X of the correlation matrix 
constructed from N = 367 uncorrelated time series with length 
L = 400. The uncorrelated time series is generated by shuffling the 
empirical return series, in which the equal-time correlations 
between the original return series are destroyed. We compute the 
cross-correlations between these shuffled return series, and use this 
surrogate correlation matrix as a random correlation matrix. In 



Figure 6, black circled line denotes the 99th percentile of the 
eigenvalues calculated from the random correlation matrix. It 
stays relatively constant about 3 as the time r evolves. This means 
that 99 percent of the eigenvalues of the random correlation 
matrix are less than this value. 

If an eigenvalue of the empirical correlation matrix is larger 
than the 99th percentile of the eigenvalues generated from the 
shuffled return series, it is considered to be significantly larger than 
the eigenvalues of the random correlation matrix. In Figure 6, the 
number of the eigenvalues significantly larger than those of the 
random correlation matrix is plotted by the red square line. The 
number of empirical X significantly larger than X of random 
correlation matrix fluctuates over time t. For t around two date 
points 02/04/2003 and 04/09/2009, it shows a minimal value 
about 5, while for t around 25/12/2006, it shows a maximal value 
about 16. This means that the number of significant eigenvalues in 
the volatile periods close to 30/07/2001-02/04/2003 and 17/01/ 
2008-04/09/2009 is less than that in the calm period close to 10/ 
05/2005-25/12/2006. To further illustrate the volatile and calm 
periods of the A-share market, we also plot the index composed of 
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Figure 10. Projection G'(f) of the 367 stock returns on the 
largest eigenvector u 1 . G'(t) obtained from the correlation matrices 
of the return series in the moving windows ended on 06/09/2000, 02/ 
04/2003, 25/12/2006, 13/01/2009, 11/05/2010, and 30/12/2011, as a 
function of the return of the A-share Index of Shanghai Stock Exchange. 
The A-share Index is composed of all the A-share stocks listed on the 
Shanghai Stock Exchange. Curves are removed for clarity. A linear 
regression between the two normalized axes for different moving 
windows yields slopes: 0.99, 0.99, 0.88, 0.88, 0.90, and 0.91. 
doi:10.1371/journal.pone.0097711.g010 

all A-share stocks in the figure. The crashes of 2001-2003 and 
2008-2009 seem to start from middle 2001 and early 2008 
respectively, and the following indices keep dropping for long 
periods of time. Between these two crashes, there exists a calm 
period from middle 2005 to late 2006, in which the A-share index 
shows a local minimum in middle 2005 and relatively small values 
till late 2006. 

We give a cursory explanation for the above phenomenon. It 
can be easily proved that the sum of the eigenvalues of the stock 
correlation matrix is fixed to be the number of sample stocks, i.e, 
J2k=i ^k = N. As shown in the distribution of the eigenvalues, the 
major portion of eigenvalues are distributed in the region X<3, 
and the large eigenvalues />20 close to the market crashes of 
2001-2003 and 2008-2009 are prominently larger than those 
during the calm period. Therefore, the number of eigenvalues in- 
between 3<A<20 during crashes is less than calm periods. This 
may indicate that a few of the eigenvalues contain the information 
about the stock correlations when the market strongly fluctuates. 

The commonality among the stock returns can also be detected 
by the PCA method, which has a close link to the RMT method. 
In fact, the systemic risk measured by the collective behavior of the 
stock price movements based on PCA has been analyzed in many 
studies [6-8] . The risk estimation is also valuable for the portfolio 
optimization, and some work has been done to analyze the risk- 
return relationship [48,49]. The PCA method decomposes the 
returns of a sample of stocks into several orthogonal principal 
components. The principal components Cjt are uncorrelated, and 
satisfy the condition <{/ f £/> = Xk if k = l, where Xk is the k-th 
eigenvalue of the correlation matrix C of stock returns. The 
standardized return of stock i, defined as Z; = [Gj(t) — <G,(7))]/ff;, 
can be expressed as a linear combination of the principal 
components C/ t - 



22 L ' k ^k- 
k=\ 



(5) 



where A = 367 is the total number of stocks analyzed, and is 
the component of ^:-th eigenvector corresponding to stock i, which 
is also known as the factor loading of Cjt f° r stock i. The total 
variance of the return series is 



a 2 = 2~22~2^2 a i a jLikLjkh 



(6) 



= 1 j-l k=\ 



in which the total variance is decomposed into the orthogonal 
factor loadings L and the eigenvalues X. For the periods that stocks 
are highly correlated and connectively volatile, a small number 
n<N of eigenvalues can explain most of the volatility in the 
system. 

The cumulative risk fraction (CRF) is generally used to quantify 
the proportion of total variance explained by the first n principal 
components [7] , also known as absorption ratio in [8] . It is defined 



J2"k=l j* 



(7) 



where Xk is the k-t\\ eigenvalue, X\ > X% > ■ ■ ■ > Ajy. Since the PCA 
is done through the decomposition of the correlation (covariance) 
matrix of return (standardized return) series, the total variance of 
the system explained by all N principal components is quantified 
as Xk. The variance associated with the first n principal 

components is quantified as Ylk=l ~ A ><- The CRF is the ratio of 
these two quantities. 

In Figure 7, the CRFs for n = l,lQ,50,and 367 are shown as a 
function of the evolving time t. The CRF displays two local 
maxima at t nearby 02/04/2003 and 04/09/2009, at which it can 
explain more than 50%, 60%, and 80% of the total variance for 
n=\, 10, and 50 respectively. This indicates that the stocks are 
highly correlated in the moving windows from 30/07/2001 to 02/ 
04/2003 and from 17/01/2008 to 04/09/2009, in which the 
majority of the stock returns tend to move together. Thus the stock 
market is at a high level of systemic risk. We also observe that the 
CRF displays a relatively small value in the moving window from 
10/05/2005 to 25/12/2006, in which the stocks are less 
correlated. These results are coincident with those observed in 
the mean correlation coefficient. 

Evolution of eigenvectors and their interpretations 

To analyze the information contained in the deviating 
eigenvectors, we first investigate the contributions of the eigen- 
vector components grouped in conventional industries. According 
to the China Securities Regulatory Commission (CSRC) industry 
code, the stocks traded on Shanghai Stock Exchange are grouped 
into A-M conventional industries. Table 1 presents summary 
statistics of the 22 industries, including the industry codes, industry 
names, and the number of chosen stocks belonging to each 
industry. For each deviating eigenvector u k , with element as the 
component of the k-th eigenvector corresponding to stock (, we 
calculate the contribution of each industry group 
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Table 2. Largest ten components of deviating eigenvectors in period from 06/09/2000 to 02/04/2003. 



u 1 






» 3 






Stock code 


Industry 


Industry code 


Stock code 


Industry 


Industry code 


600718 


IT 


G 


600613 


Pharmaceuticals 


C8 


600098 


Utilities 


D 


600845 


IT 


G 


600832 


Comprehensive 


M 


600614 


Real estate 


J 


600198 


IT 


G 


600822 


Wholesale & retail trade 


H 


600657 


Real estate 


J 


600843 


Machinery 


C7 


600637 


Electronics 


C5 


600619 


Machinery 


C7 


600100 


IT 


G 


600818 


Machinery 


C7 


600776 


IT 


G 


600618 


Petrochemicals 


C4 


600138 


Social services 


K 


600841 


Machinery 


C7 


600770 


IT 


G 


600688 


Petrochemicals 


C4 








u 5 






Stock code 


Industry 


Industry code 


Stock code 


Industry 


Industry code 


600623 


Petrochemicals 


C4 


600773 


Real estate 


J 


600695 


Food & beverage 


CO 


600847 


Machinery 


C7 


600618 


Petrochemicals 


C4 


600818 


Machinery 


C7 


600614 


Real estate 


J 


600647 


Real estate 


J 


600613 


Pharmaceuticals 


C8 


600696 


Real estate 


J 


600612 


Other manufacturing 


C99 


600770 


IT 


G 


600886 


Utilities 


D 


600058 


Wholesale & retail trade 


H 


600821 


Wholesale & retail trade 


H 


600608 


Metals & non-metals 


C6 


600079 


Pharmaceuticals 


C8 


600055 


Machinery 


C7 


600841 


Machinery 


C7 


600792 


Petrochemicals 


C4 



Largest ten components of u 2 , m 3 , u 4 , and u 5 by the average ranks of the eigenvector components taken over the moving windows with ending dates from 06/09/2000 
to 02/04/2003. The eigenvectors are obtained from the correlation matrices of the return series in these moving windows. The stock codes corresponding to the largest 
ten components, the industries they belonging to, and the industry codes are listed. 
doi:1 0.1 371 /journal.pone.009771 1 .t002 



x 



N ' i 



(SI 



where Ni is the number of stocks belonging to industry group /, 
1= 1, • • • ,22. The measure of Xf is analogous to the analysis of 
wave function in disordered systems, and firstly introduced to 
financial data analysis in Ref. [50]. 

Figure 8 shows Xf for deviating eigenvectors u 1 evolved with 
time t. The participants of the eigenvectors listed in the horizontal 
axis are 367 stocks. The stocks belonging to industry group / are 
endowed with the same value of Xf, and ranked by their 
capitalizations on the ending date of the sample data. We find that 
Xf for the largest eigenvector u l universally shows large values 
among different industries, which means that almost all the 
industries have significant contributions to it 1 . It is quite robust for 
different t. In Figure 9, Xf for u 2 , u 3 , u 4 and w 5 show different 
patterns in the periods divided by the date points 02/04/2003, 
25/12/2006, and 04/09/2009. In addition, Xf before and after 
two date points 13/01/2009 and 11/05/2010, which are the 
ending dates of the moving windows started from 30/05/2007 and 
16/09/2008 respectively, show remarkably different patterns. 
These discrete patterns can be easily observed for u 2 and it. The 
Shanghai Stock Exchange fell 6.5% on 30/05/2007, which was 



caused by an increase in the stamp tax on stock transactions to 
0.3% from 0.1%. The bankruptcy of Lehman Brothers on 14/09/ 
2008 indicated that the financial crisis entered an acute phase, and 
the Chinese stock market started to be affected by the global 
financial crisis after that, showing a 4.5% fall on 16/09/2008. 
Therefore, we choose the ending dates of these two moving 
windows, i.e., 13/01/2009 and 11/05/2010, as additional 
dividing dates. The date points 02/04/2003, 25/12/2006, 13/ 
01/2009, and 11/05/2010 are picked as coarse-grained dividing 
points. 

We next analyze the contributions of industries in different time 
periods separated by the four dividing dates. As shown in Figure 9, 
u 2 and u s show large values of Xf for the electronics and IT 
industries respectively in the first period from 06/09/2000 to 02/ 
04/2003. In the following period from 02/04/2003 to 25/12/ 
2006, mining, electronics, and real estate industries have 
significant contributions to w 2 , u 3 , and u 5 respectively. Real estate 
industry is a significant contributor of u in the periods from 25/ 
12/2006 to 13/01/2009, and of u 3 and u 4 from 13/01/2009 to 
1 1/05/2010. In the last period from 1 1/05/2010 to 30/12/201 1, 
both real estate and pharmaceuticals industries have significant 
contributions to w 2 , and mining industry is a significant 
contributor of u 3 . It is worth noting that Xf of finance & 
insurance and other manufacturing industries display large values 
for u 2 -u 5 . We neglect their contributions to the deviating 
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Table 3. Largest ten components of deviating eigenvectors in period from 02/04/2003 to 25/12/2006. 



u 1 






K 3 






Stock code 


Industry 


Industry code 


Stock code 


Industry 


Industry code 


600123 


Mining 


B 


600171 


Electronics 


C5 


600009 


Transportation 


F 


600602 


Electronics 


C5 


600848 


Machinery 


C7 


600100 


IT 


G 


600098 


Utilities 


D 


600088 


Communication & cultural 
industry 


L 


600695 


Food & beverage 


CO 


600832 


Comprehensive 


M 


600740 


Petrochemicals 


C4 


600775 


IT 


G 


600642 


Utilities 


D 


600770 


IT 


G 


600795 


Utilities 


D 


600624 


Comprehensive 


M 


600096 


Petrochemicals 


C4 


600198 


IT 


G 


600649 


Comprehensive 


M 


600608 


Metals & non-metals 


C6 


« 4 












Stock code 


Industry 


Industry code 


Stock code 


Industry 


Industry code 


600841 


Machinery 


C7 


600648 


Wholesale & retail trade 


H 


600818 


Machinery 


C7 


600136 


Real estate 


J 


600612 


Other manufacturing 


C99 


600823 


Real estate 


J 


600614 


Real estate 


J 


600620 


Real estate 


J 


600623 


Petrochemicals 


C4 


600781 


Pharmaceuticals 


C8 


600822 


Wholesale & retail trade 


H 


600807 


Real estate 


J 


600843 


Machinery 


C7 


600663 


Real estate 


J 


600610 


Machinery 


C7 


600086 


Other manufacturing 


C99 


600618 


Petrochemicals 


C4 


600054 


Social services 


K 


600604 


Machinery 


C7 


600715 


Machinery 


C7 



Largest ten components of it 2 , w 3 , i/ 4 , and ti 5 by the average ranks of the eigenvector components taken over the moving windows with ending dates from 02/04/2003 
to 25/1 2/2006. The eigenvectors are obtained from the correlation matrices of the return series in these moving windows. The stock codes corresponding to the largest 
ten components, the industries they belonging to, and the industry codes are listed. 
doi:1 0.1 371 /journal.pone.009771 1 .t003 



eigenvectors, since there are only small numbers of chosen stocks 
belonging to these two industries. 

To further confirm the wide influence of the largest eigenvector 
observed in the contributions of industries, we also calculate the 
projection of the stock returns Gj(f) on the largest eigenvector u 

G\t)='£u}Gj(t), (9) 

7=1 

where uj is the component of u 1 corresponding to stock j, and N is 
the number of sample stocks. In Figure 1 0, we plot G 1 (/) against 
the return of the A-share Index of Shanghai Stock Exchange 
GA-shmc(t) for the moving windows ended on 06/09/2000, 02/ 
04/2003, 25/12/2006, 13/01/2009, 11/05/2010, and 30/12/ 
201 1. The A-share Index is composed of all A-share stocks traded 
on Shanghai Stock Exchange. The projection G l (t) can be well 
fitted by a linear fit, which shows a narrow scatter around the 
fitted line in figure. The slope is about 0.93 + 0.06, with a slight 
quantitative difference for different moving windows. This value is 
a little bit larger than 0.85 observed in [18]. The significant linear 
correlation between G x (t) and GA-share{t) indicates that the 
largest eigenvalue can be interpreted as quantifying market-wide 
influence on all stocks, and it remains quite robust to the variance 
of?. In fact, all the components of u l are positive in our study, and 



similar results are revealed in [26]. The A-share Index is a 
capitalization-weighted average of the prices of all A-share stocks, 
and large components of M 1 are universally distributed among all 
stocks. Thus it would be no surprise to observe the significant 
correlation between G'(f) and GA- S hare(f)' 

We have offered an overall observation of the contributions of 
industry groups. For the interpretation of w 2 -w 5 , we further analyze 
the component stocks which significantly contribute to each 
deviating eigenvector in different time periods divided by the date 
points 02/04/2003, 25/12/2006, 13/01/2009, and 11/05/2010. 
The minor adjustments of the dividing dates centered around 
them do not significantly change the results. Table 2-6 show the 
stocks and industry groups corresponding to the largest ten 
components of the deviating eigenvectors u 2 , w 3 , w 4 , and u 5 by the 
average ranks of the eigenvector components in different time 
periods. We rank the components according to their eigenvector 
component values, and average the ranks of the components over 
the moving windows with ending dates from 06/09/2000 to 02/ 
04/2003, from 02/04/2003 to 25/12/2006, from 25/12/2006 to 
13/01/2009, from 13/01/2009 to 11/05/2010, and from 11/05/ 
2010 to 30/12/2011. The components with the smallest ten 
average ranks are picked as the largest ten components. The 
largest ten components correspond to ten stocks which signifi- 
cantiy contribute to the relevant eigenvectors. 
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Table 4. Largest ten components of deviating eigenvectors in period from 25/12/2006 to 13/01/2009. 





u 2 






H 3 






Stock code 


Industry 


Industry code 


Stock code 


Industry 


Industry code 


600660 


Metals & non-metals 


C6 


600610 


Machinery 


C7 


600716 


Real estate 


J 


600751 


Transportation 


F 


600773 


Real estate 


J 


600711 


Mining 


B 


600809 


Food & beverage 


CO 


600733 


Real estate 


J 


600600 


Food & beverage 


CO 


600757 


Textiles & apparel 


CI 


600096 


Petrochemicals 


C4 


600695 


Food & beverage 


CO 


600761 


Machinery 


C7 


600722 


Petrochemicals 


C4 


600875 


Machinery 


C7 


600101 


Utilities 


D 


600887 


Food & beverage 


CO 


600664 


Pharmaceuticals 


C8 


600694 


Wholesale & retail trade 


H 


600608 


Metals & non-metals 


C6 








u 5 






Stock code 


Industry 


Industry code 


Stock code 


Industry 


Industry code 


600663 


Real estate 


J 


600691 


Metals & non-metals 


C6 


600639 


Real estate 


J 


600695 


Food & beverage 


CO 


600675 


Real estate 


J 


600699 


Petrochemicals 


C4 


600648 


Wholesale & retail trade 


H 


600133 


Construction 


E 


600638 


Real estate 


J 


600724 


Real estate 


J 


600694 


Wholesale & retail trade 


H 


600634 


Real estate 


J 


600665 


Real estate 


J 


600191 


Food & beverage 


CO 


600622 


Real estate 


J 


600884 


Textiles & apparel 


CI 


600732 


Real estate 


J 


600757 


Textiles & apparel 


CI 


600858 


Wholesale & retail trade 


H 


600868 


Utilities 


D 



Largest ten components of u 2 , m 3 , u 4 , and w 5 by the average ranks of the eigenvector components taken over the moving windows with ending dates from 25/1 2/2006 
to 13/01/2009. The eigenvectors are obtained from the correlation matrices of the return series in these moving windows. The stock codes corresponding to the largest 
ten components, the industries they belonging to, and the industry codes are listed. 
doi:1 0.1 371 /journal.pone.009771 1 .t004 



If one looks carefully at the stock codes of the largest ten 
components, dynamic evolutions of conventional stock industries 
are remarkably observed. The stocks belonging to the industries 
which have significant contributions to distinct eigenvectors also 
appear in their largest ten components. For the moving windows 
with ending dates in the period from 06/09/2000 to 02/04/2003, 
as shown in Table 2, among the largest ten components of u 2 five 
stocks belong to IT industry and one stock belongs to electronics 
industry, and for u 3 four stocks belong to machinery industry and 
two stocks belong to petrochemicals industry. In the following 
period from 02/04/2003 to 25/12/2006, as shown in Table 3, 
four IT stocks and two electronics stocks are in the list of the 
largest ten components of u 3 , and five machinery stocks and two 
petrochemicals stocks are in the list of « 4 . More interestingly, 
stocks 600198, 600100, and 600770, which are among the largest 
ten components of u 2 in the first time period, appear in the largest 
ten components of u 3 in the following period. The starting dates of 
the moving windows in the first period are from 04/01/1999 to 
30/07/2001, and from 30/07/2001 to 10/05/2005 for the second 
period. The evolutions of the IT and electronic industries recall 
the history of the Chinese stock market in the period of 1 999— 
2001. During that period of time, the Chinese stock market was in 
a bull market, and high-tech stocks issued by companies deal in IT 
and electronics were leading the rise. After 2001, the Chinese stock 
market started to decline, thus the IT and electronics stocks are 
contained in u 3 . Similar phenomenon is observed for the stocks in 



machinery and petrochemicals industries: stocks 600843, 600818, 
600618, and 600841 among the largest ten components of u 3 in 
the first time period become the members of the largest ten 
components of w 4 in the following period. 

The dynamic evolution behavior is also observed in real estate 
industry. In the period from 02/04/2003 to 25/12/2006, five 
stocks belonging to real estate industry appear in the largest ten 
components of u 5 . The number of real estate stocks in the largest 
ten components of u 4 increases to seven in the period from 25/12/ 
2006 to 13/01/2009, as shown in Table 4. In the following period 
from 13/01/2009 to 1 1/05/2010, five (seven) real estate stocks are 
in the largest ten components of u 3 (m 4 ), as shown in Table 5. After 
September 2008, the Chinese stock market tended to be affected 
by the global financial crisis, and the stocks belonging to real estate 
industry were leading the drop. In Table 6, we observe that seven 
real estate stocks appear in the largest ten components of u 2 for the 
period from 11/05/2010 to 30/12/2011, in which the moving 
windows have starting dates from 16/09/2008 to 12/05/2010. In 
general, the real estate stocks contained in the largest five 
eigenvectors slowly move to be contained in the second largest 
eigenvector as the time approaches the global financial crisis. This 
conclusion is based upon the fact that many real estate stocks 
appear repeatedly in the largest ten components of the largest five 
eigenvectors in different periods. For instance, stock 600663 first 
appears in the largest ten components of u 5 in the period from 02/ 
04/2003 to 25/12/2006, then it moves to be in those of w 4 in the 
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Table 5. Largest ten components of deviating eigenvectors in period from 13/01/2009 to 11/05/2010. 



u 1 






K 3 






Stock code 


Industry 


Industry code 


Stock code 


Industry 


Industry code 


600728 


IT 


G 


600648 


Wholesale & retail trade 


H 


600890 


Real estate 


J 


600732 


Real estate 


J 


600076 


IT 


G 


600620 


Real estate 


J 


600751 


Transportation 


F 


600621 


Electronics 


C5 


600891 


Wholesale & retail trade 


H 


600622 


Real estate 


J 


600773 


Real estate 


J 


600062 


Pharmaceuticals 


C8 


600892 


Machinery 


C7 


600694 


Wholesale & retail trade 


H 


600757 


Textiles & apparel 


CI 


600716 


Real estate 


J 


600800 


Comprehensive 


M 


600634 


Real estate 


J 


600714 


Mining 


B 


600750 


Pharmaceuticals 


C8 


tt 4 






» 5 






Stock code 


Industry 


Industry code 


Stock code 


Industry 


Industry code 


600067 


Machinery 


C7 


600783 


Metals & non-metals 


C6 


600675 


Real estate 


J 


600836 


Paper & printing 


C3 


600748 


Real estate 


J 


600729 


Wholesale & retail trade 


H 


600895 


Real estate 


J 


600828 


Wholesale & retail trade 


H 


600684 


Real estate 


J 


600697 


Wholesale & retail trade 


H 


600773 


Real estate 


J 


600635 


Utilities 


D 


600064 


Real estate 


J 


600779 


Food & beverage 


CO 


600109 


Finance & insurance 


1 


600887 


Food & beverage 


CO 


600665 


Real estate 


J 


600624 


Comprehensive 


M 


600084 


Food & beverage 


CO 


600106 


Transportation 


F 



Largest ten components of u 2 , m 3 , u 4 , and w 5 by the average ranks of the eigenvector components taken over the moving windows with ending dates from 1 3/01/2009 
to 1 1 /05/201 0. The eigenvectors are obtained from the correlation matrices of the return series in these moving windows. The stock codes corresponding to the largest 
ten components, the industries they belonging to, and the industry codes are listed. 
doi:l 0.1 371 /journal.pone.009771 1 .t005 



following period from 25/12/2006 to 13/01/2009, and finally it 
becomes a member of those of u 2 in the latest period from 1 1/05/ 
2010 to 30/12/2011. 

In both analysis of the contributions of industry groups and 
component stocks, sectors like IT, electronics, machinery, petro- 
chemicals, and real estate are significant contributors and their 
dynamic evolutions are clearly observed. Other sectors which have 
large industry contributions are also observed in the largest ten 
components of the same eigenvector in a particular time period. 
For instance, utilities and mining have large contributions to u 2 in 
the period from 02/04/2003 to 25/12/2006, metals & non-metals 
and mining have large contributions to w 4 in the period from 1 1 / 
05/2010 to 30/12/2011, and pharmaceuticals has large contri- 
bution to u 5 in the period from 1 1/05/2010 to 30/12/201 1. 

Discussion and Conclusion 

In summary, we have conducted a thorough study of the 
evolution of the cross-correlations between the return series of 367 
A-share stocks on Shanghai Stock Exchange from 1999 to 2011. 
We find that the stock returns behave more collectively in volatile 
periods, showing biased distribution of correlation coefficients 
centered around lager positive coefficients and larger values of 
mean correlation coefficient as the time approaches the two big 
crashes in 2001 and 2008. In the same volatile periods, we find 
that the largest eigenvalue shows larger values, while the number 



of eigenvalues that significantly deviate from those of the random 
correlation matrix is less. In addition, only a small number of 
eigenvalues can explain the major portion of the total system 
variance when the market is volatile, which indicates a high level 
of systemic risk. 

For the interpretation of deviating eigenvectors, we have further 
analyzed the eigenvector components and their contributions. By 
computing the contributions of the components grouped in 
conventional industries, we find significant contributors, such as 
mining, electronics, IT, and real estate, for distinct eigenvectors 
over different time t. We also analyze the projection of the stock 
returns on the largest eigenvector, and confirm the market-wide 
influence of the largest eigenvector and its stability in time. In the 
analysis of the component stocks which significantly contribute to 
each eigenvector, dynamic evolution of conventional industries are 
observed, basically consistent with the results of industry contri- 
butions. The stocks in IT and electronics industries significandy 
contributing to the second largest eigenvector before the crash in 
2001 become the significant contributors of the third largest 
eigenvector after the crash. Similarly, the stocks in real estate 
industry significantly contributing to other deviating eigenvectors 
before the crisis of 2008-2009 become the significant contributors 
of the second largest eigenvector during the crisis period. 

We offer a new interpretation of the deviating eigenvectors of 
the correlation matrices in the Chinese stock market. It is revealed 
that the information contained in a particular eigenvector varies 
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Table 6. Largest ten components of deviating eigenvectors in period from 11/05/2010 to 30/12/2011. 





u 2 






K 3 






Stock code 


Industry 


Industry code 


Stock code 


Industry 


Industry code 


600748 


Real estate 


J 


600058 


Wholesale & retail trade 


H 


600067 


Machinery 


C7 


600757 


Textiles & apparel 


CI 


600823 


IT 


J 


600188 


Mining 


B 


600675 


Real estate 


J 


600838 


Wholesale & retail trade 


H 


600657 


Real estate 


J 


600117 


Metals & non-metals 


C6 


600664 


Pharmaceuticals 


C8 


600692 


Transportation 


F 


600639 


Real estate 


J 


600699 


Petrochemicals 


C4 


600052 


Real estate 


J 


600675 


Real estate 


J 


600663 


Real estate 


J 


600606 


Real estate 


J 


600102 


Metals & non-metals 


C6 


600631 


Wholesale & retail trade 


H 








H 5 






Stock code 


Industry 


Industry code 


Stock code 


Industry 


Industry code 


600757 


Textiles & apparel 


CI 


600680 


IT 


G 


600751 


Transportation 


F 


600809 


Food & beverage 


CO 


600180 


Agriculture 


A 


600651 


Machinery 


C7 


600102 


Metals & non-metals 


C6 


600607 


Wholesale & retail trade 


H 


600891 


Wholesale & retail trade 


H 


600756 


IT 


G 


600608 


Metals & non-metals 


C6 


600666 


Pharmaceuticals 


C8 


600123 


Mining 


B 


600640 


Wholesale & retail trade 


H 


600604 


Machinery 


C7 


600085 


Pharmaceuticals 


C8 


600188 


Mining 


B 


600812 


Pharmaceuticals 


C8 


600699 


Petrochemicals 


C4 


600654 


Electronics 


C5 



Largest ten components of u 2 , m 3 , u 4 , and w 5 by the average ranks of the eigenvector components taken over the moving windows with ending dates from 1 1 /05/201 0 
to 30/1 2/201 1 . The eigenvectors are obtained from the correlation matrices of the return series in these moving windows. The stock codes corresponding to the largest 
ten components, the industries they belonging to, and the industry codes are listed. 
doi:1 0.1 371 /joumal.pone.009771 1 .t006 



over time, which is different from results of fixed sectors and sub- 
sectors observed in the static correlation analysis. The dynamic 
evolution of significant eigenvector contributors reminds us of the 
sector rotation commonly observed in financial market. This work 
is valuable for the understanding of risk propagation among 
interconnected stocks and the classification of stock sectors, and 
can be further applied to portfolio risk estimation and systemic risk 
management. 
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